ADS_Project1

Abstract

In today's life, ideology has become an indispensable part of life. It is precisely because of the difference in ideology that our way of dealing with life and the way of thinking about problems are different. Therefore, the quantitative analysis of textual data on philosophy may be a topic worthy of consideration. This article will use python to ask questions about philosophers, philosophical schools, and the philosophical discourse itself and the possible relationships between them. And use the knowledge of data analysis with relevant data to give analysis and answers.

(Key Words: Philosophy, Data Analysis, Python)

Introduction

We may wonder what the ideology is. There are a lot of different kinds of ideologies, including political, social, epistemological, and ethical. Recent analysis tends to posit that ideology is a 'coherent system of ideas' that rely on a few basic assumptions about reality that may or may not have any factual basis. Through this system, ideas become coherent, repeated patterns through the subjective ongoing choices that people make. These ideas serve as the seed around which further thought grows. The belief in an ideology can range from passive acceptance up to fervent advocacy. According to most recent analysis, ideologies are neither necessarily right nor wrong[1]. Based on these ideologies, they form kinds of philosophy schools and ideas.

(1) Import Data

We use pandas to import data and then we show the data to see whether we can find some questions (some relations) to seek.

Based on the obeservation, the data does not exist garbled.

(2) Data Information

From the imported data, we can see that the data is formed by title, author, school, sentence length, tokenized text and lemmatized_str. And there is no null data, hence we can think that the data is reliable to use without further processing.

And then we can see the composition of all the school including the author of the respective school.

Method & Analysis

Our core method to analyze the data is to use EDA. Exploratory data analysis (EDA) is used by data scientists to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods[2]. In this part, we will elicit some questions first and then use EDA to give a direct analyis about the question with some explanation.

Questions and Answer

(a) Question 1: What is the popularity of ideological schools in the dataset ?

The reason why the question is posted is because from the raw data, we can see the different groups of ideologies. Therefore, we want to know which school, author or title is used most, which can help us to reflect current ideological mainstream to some degrees. In this section, we will use bar plot to show the output to analyze the question, which can intuitively and clearly reflect the number of distributions.

From the distribution plot, we can see that the most popular ideology is about Analytic--Aristotle--Aristotle (completed work) . Hence we may see that Analytic school and Aristotle is the most mainstream in the dataset.

(b) Question 2: Which philosopher school is the most quite and which school likes to speak more ?

In this section, we may want to investigate the speaking length of philosophers, that is, we may want to find which school like to speak more and which school prefers to speak a liitle.

We will focus on the length of sentence and the token texts by school to do analysis and then deduce their speaking hlength, because the length of speaking/tokens is a significant feature of a person's speaking preference, and it can also reflect a person's personality to a certain extent.

We can roughly see the shape of the histogram, however, the graph is too skewed, hence we can do log to the variable.

Based on the graph, we can see that the max length is 2649 and the min length is 20, moreover, the mean is 150, which can be regarded as a standard to judge the sentence length.

After showing the data information about sentence length, we then analyze the data information about 'tokens'.

We then can find that the 'Number of Tokens' mainly concentrated when number is around 25, which is the mean value. And the frequency on both sides of the peak shows a decreasing trend.

And then we will investigate the number of sentence length and Tokens by school with violin and box plot.

From the violin and box graph, on the one hand, we can find that no matter numbers of tokens and sentence length, continental tends to be the most, which reflects that the school of continental prefers to speak or cite more. On the other hand, plato school is the most quite class among all schools, which may reflect that they would like to consider more but speak less.

(c) Question 3: What are the speaking habits of philosophers from different schools ?

In this section, we will investigate which words the philosophers like to speak most by school.

We firstly use wordcloud to see the words each school used. The bigger the words on the graph is, the more frequency the words are used.

c-1. Words that different genres like to say.

We can then find the words for each school likes to speak more:

From their often used words, we can see their characters to some extents. For example, Capitalism likes to mention "employment", "money" , however communism prefers to talk about labour, commodity.

In addition, the frequency of used words among all the schools is also an important point to figure out the habits of philosophers.

c-2. Words that all genres like to say

And then we will use plot to show the part of output.

We can see that the most frequent words are all Pronouns, conjunctions and prepositions or Copula and Modal verbs, which is a lack of meaning. Hence by obeservation, we find when around 150th frequency, the words begin to have characters.

We also use pie chart to show the Proportion about the top 20 used words.

From the graph and print output, we can see that except Pronouns, conjunctions and prepositions or Copula and Modal verbs, philosophers also like to say power, god, mind, people, and existent and so on.

Conclusion

  1. We find that Analytic--Aristotle--Aristotle (completed work) may be the most popular school/author/title, because its frequence is ranked first among all the school.

  2. We find that continental tends to speak the most but plato school is the most quite school.

  3. Every school has its unique speaking preference and prefered words, but they all may prefer to mention about people, god, mind, power and existent and so on.

Development

  1. In this section, we do not analyze the conclusion with its background, that is, we do not link the data output with the real world background.
  2. There is no in-depth exploration of what factors can affect the length of speech and other issues.
  3. For c-2 parts, it is possible that some schools dominate a word that may not be used by all schools.

Reference

[1] [https://en.wikipedia.org/wiki/Ideology]

[2] [https://www.ibm.com/cloud/learn/exploratory-data-analysis]